Add poison message handling to the dispatchers by sophiatev · Pull Request #1366 · Azure/durabletask

sophiatev · 2026-06-10T16:47:56Z

This PR introduces poison message handling to the dispatchers. This is done by

Introducing a new interface IPoisonMessageHandler that the any orchestration service which has poison message handling is expected to implement
In the case of data corruption, the dispatchers will invoke this interface (if provided) to handle the message rather than throwing an exception
Otherwise, the dispatchers will invoke this interface to determine if a message is "poisoned", and if so take remediating action if possible (for example failing the orchestration or entity call).

The orchestration service is otherwise responsible for determining what to do with the poison message(s) and how to store them.

…dingMessage

…arameters

…ase of poison message handling, except for entity unlock requests

Co-authored-by: Chris Gillum <cgillum@microsoft.com>

Co-authored-by: Chris Gillum <cgillum@gmail.com>

…ad for trace activities

…vent for json deserialization, etc.

sophiatev · 2026-06-10T16:54:49Z

            this.Reason = reason;
        }

+        // Private ctor for JSON deserialization (required by some storage providers and out-of-proc executors)


Unrelated to this PR but I bug I found when testing (JSON was not able to deserialize this event because it lacked a 0-arg constructor and the other constructors all had multiple parameters)

sophiatev · 2026-06-10T16:55:44Z

Also unrelated to this PR, but I realized while working on it that this code I wrote a while back had some incorrect assumptions so I took the opportunity to fix it

Copilot

Pull request overview

This PR adds an extensibility hook (IPoisonMessageHandler) and integrates poison/invalid message detection into the core dispatchers so that corrupted or “poisoned” inputs can be handled deterministically (e.g., fail orchestration/activity/entity work) instead of always throwing.

Changes:

Introduces IPoisonMessageHandler and wires it into orchestration/activity/entity dispatchers for invalid work items and poison message detection.
Adds structured logging support for poison-message detection (new event ID + event source + log event).
Adds dispatch-count tracking on history events and propagates poison metadata through entity request processing.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/DurableTask.Core/Tracing/TraceHelper.cs	Adjusts entity invocation activity ending to better handle partial result sets.
src/DurableTask.Core/TaskOrchestrationDispatcher.cs	Adds poison detection/handling and updates reconciliation to return a drop reason.
src/DurableTask.Core/TaskEntityDispatcher.cs	Adds poison detection/handling for entity messages, plus poison-aware batching/result shaping.
src/DurableTask.Core/TaskActivityDispatcher.cs	Adds poison/invalid handling for activity scheduling messages (including failing poisoned tasks).
src/DurableTask.Core/Logging/StructuredEventSource.cs	Adds a new structured event for poison message detection.
src/DurableTask.Core/Logging/LogHelper.cs	Adds `PoisonMessageDetected` helper overloads emitting structured logs.
src/DurableTask.Core/Logging/LogEvents.cs	Adds a new structured log event type for poison messages.
src/DurableTask.Core/Logging/EventIds.cs	Reserves a new event ID for poison message detection.
src/DurableTask.Core/IPoisonMessageHandler.cs	New interface defining poison detection and handling hooks.
src/DurableTask.Core/History/HistoryEvent.cs	Adds `DispatchCount` to history events for poisoning heuristics/telemetry.
src/DurableTask.Core/History/ExecutionRewoundEvent.cs	Adds a parameterless ctor for JSON deserialization compatibility.
src/DurableTask.Core/Entities/OrchestrationEntityContext.cs	Adds `AbandonAcquire()` to reset lock acquisition state on failure.
src/DurableTask.Core/Entities/EventFormat/RequestMessage.cs	Adds poison metadata fields used during entity request processing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… combined' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

cgillum

Some initial comments. I haven't gone through the dispatcher code yet (those are bigger diffs).

cgillum · 2026-06-11T22:10:19Z

+        /// If the request message is poisoned, the reason it is poisoned.
+        /// Otherwise, null.
+        /// </summary>
+        public string? PoisonReason { get; set; }


What's the use case for PoisonReason in entity request messages?

It's used when generating the failure response for a poisoned entity operation, i.e. here.

Since processing the history event that corresponds to the entity request is decoupled from sending the response, we need to store the poison reason in the request message so we can use it later to populate the failure details.

cgillum · 2026-06-11T22:14:51Z

+        /// <param name="historyEvent">The history event being dispatched.</param>
+        /// <param name="reason">Why the message is considered poisoned.</param>
+        /// <returns><c>true</c> if the message should be treated as poisoned; otherwise <c>false</c>.</returns>
+        public bool IsPoisonMessage(HistoryEvent historyEvent, out string? reason);


Why do we need the backend to decide whether something is a poison message? Shouldn't the dispatcher be making this decision?

We don't, and it definitely can. I thought it would be the responsibility of the "poison message handler" since it's sort of in the name but this was just a somewhat arbitrary choice on my end. I can return responsibility to the dispatchers

One thing that's tricky about reviewing this interface is that we don't yet have any implementations of it. I'm thinking we should probably implement this for Azure Storage before committing to these new public APIs.

Actually one thing I remembered is that doing it this way meant I didn't have to pass a maxDispatchCount when instantiating the dispatchers, which meant we didn't need yet another overload for the TaskHubWorker to accept this parameter when it creates them. But if we want to keep this decision on the dispatcher side and avoid another parameter, we can maybe expose this via IPoisonMessageHandler.MaxDispatchCount, if that sounds reasonable?

cgillum

Finished reviewing. Just a few more comments (and some responses).

cgillum · 2026-06-11T23:10:49Z

+        /// <param name="historyEvent">The history event being dispatched.</param>
+        /// <param name="reason">Why the message is considered poisoned.</param>
+        /// <returns><c>true</c> if the message should be treated as poisoned; otherwise <c>false</c>.</returns>
+        public bool IsPoisonMessage(HistoryEvent historyEvent, out string? reason);


One thing that's tricky about reviewing this interface is that we don't yet have any implementations of it. I'm thinking we should probably implement this for Azure Storage before committing to these new public APIs.

cgillum · 2026-06-11T23:13:14Z

@@ -1,4 +1,4 @@
-//  ----------------------------------------------------------------------------------
+//  ----------------------------------------------------------------------------------


It might be good to have @sebastianburckhardt review the changes to this file as I'm less familiar with the details of entity dispatching.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Sophia Tevosyan and others added 17 commits March 18, 2026 17:39

initial implementation

de244ff

fixing the compilation errors

8f6bf9d

addressing copilot comments

ea5a36f

Merge branch 'main' into stevosyan/add-poison-message-handling

3ab0859

fixing the error in the logger where I was incorrectly calling Discar…

4c7665d

…dingMessage

moved the max dispatch count from IOrchestrationService to dispatch p…

3bd1dc9

…arameters

updated the implementations to remove all exception-throwing in the c…

2e3c7c2

…ase of poison message handling, except for entity unlock requests

comment updates

eecc077

fixed a typo, added an argument range check for the max dispatch count

f0fc35d

Apply suggestion from @cgillum

ea7a67f

Co-authored-by: Chris Gillum <cgillum@microsoft.com>

Apply suggestions from code review

c976817

Co-authored-by: Chris Gillum <cgillum@gmail.com>

redid the implementation to follow an interface format

559bb4d

mroe cleanup and PR comments

68dbde9

updated to wait for the async calls

b9a875d

Merge branch 'main' into stevosyan/add-poison-message-handling

8a77b40

fixing a bug related to the results count and something incorrect i h…

c7d5803

…ad for trace activities

some more updates, adding a private 0 arg constructor to the rewind e…

b091049

…vent for json deserialization, etc.

Copilot AI review requested due to automatic review settings June 10, 2026 16:47

Copilot started reviewing on behalf of sophiatev June 10, 2026 16:48 View session

github-code-quality Bot found potential problems Jun 10, 2026

View reviewed changes

Comment thread src/DurableTask.Core/TaskEntityDispatcher.cs Fixed

Comment thread src/DurableTask.Core/TaskOrchestrationDispatcher.cs Fixed

sophiatev commented Jun 10, 2026

View reviewed changes

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread src/DurableTask.Core/TaskOrchestrationDispatcher.cs

Comment thread src/DurableTask.Core/TaskEntityDispatcher.cs Outdated

Potential fix for pull request finding 'Nested 'if' statements can be…

868b856

… combined' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 10, 2026 17:00

Copilot started reviewing on behalf of sophiatev June 10, 2026 17:00 View session

Potential fix for pull request finding 'Nested 'if' statements can be…

d13c7b3

… combined' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread src/DurableTask.Core/TaskOrchestrationDispatcher.cs

Comment thread src/DurableTask.Core/TaskEntityDispatcher.cs

Comment thread src/DurableTask.Core/TaskEntityDispatcher.cs Outdated

updating the iteration logic in the entity dispatcher

85ae250

cgillum reviewed Jun 11, 2026

View reviewed changes

addressing the first round of PR comments

7cbb931

Copilot AI review requested due to automatic review settings June 12, 2026 01:38

Copilot started reviewing on behalf of sophiatev June 12, 2026 01:39 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

missed changing one place from error to warning

756d592

		@@ -1,4 +1,4 @@
		// ----------------------------------------------------------------------------------
		// ----------------------------------------------------------------------------------

Conversation

sophiatev commented Jun 10, 2026

Uh oh!

Uh oh!

Uh oh!

sophiatev Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cgillum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cgillum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sophiatev Jun 10, 2026 •

edited

Loading